Skip to content

feat(native): Create a runtime metric for worker uptime to be used for restart alerts#26979

Merged
jaystarshot merged 2 commits intoprestodb:masterfrom
jja725:add-worker-uptime-metric
Jan 21, 2026
Merged

feat(native): Create a runtime metric for worker uptime to be used for restart alerts#26979
jaystarshot merged 2 commits intoprestodb:masterfrom
jja725:add-worker-uptime-metric

Conversation

@jja725
Copy link
Copy Markdown
Contributor

@jja725 jja725 commented Jan 17, 2026

Description

Added a runtime metric presto_cpp.worker_runtime_uptime_secs to track how long a C++ worker has been running since startup. This metric is recorded every 2 seconds via a periodic task.

Motivation and Context

The worker uptime metric exists in the Java worker but was missing in the C++ (Presto native) worker.

Impact

  • New metric: presto_cpp.worker_runtime_uptime_secs - reports the number of seconds since the worker process started
  • No performance impact as this runs in a lightweight background periodic task (every 2 seconds)
  • No public API changes

Test Plan

  • Deployed to staging cluster and verified the metric is being reported correctly
  • Verified alert integration works with the new metric

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

== RELEASE NOTES ==

Prestissimo (Native Execution) Changes
* Add worker uptime metric \`presto_cpp.worker_runtime_uptime_secs\` to track worker process runtime.

@jja725 jja725 requested review from a team as code owners January 17, 2026 00:56
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Jan 17, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 17, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds a new periodic metric in the C++ Presto worker that reports process uptime in seconds, aligning with an existing Java worker metric for restart alerting.

Sequence diagram for worker uptime metric recording

sequenceDiagram
  participant PrestoServer
  participant PeriodicTaskManager
  participant UptimeTask
  participant MetricRegistry

  PrestoServer->>PeriodicTaskManager: addTask(worker_runtime_uptime_secs, interval=2_000_000)
  Note over PrestoServer,PeriodicTaskManager: Task captures start_ time from PrestoServer

  loop Every 2 seconds
    PeriodicTaskManager->>UptimeTask: invoke()
    UptimeTask->>UptimeTask: seconds = now() - start_
    UptimeTask->>MetricRegistry: RECORD_METRIC_VALUE(kCounterWorkerRuntimeUptimeSecs, seconds)
  end
Loading

Class diagram for uptime metric and counters

classDiagram
  class PrestoServer {
    - start_ std::chrono::steady_clock::time_point
    - periodicTaskManager_ PeriodicTaskManager*
    + addServerPeriodicTasks() void
  }

  class PeriodicTaskManager {
    + addTask(taskFunction, intervalMicros int64_t, name std::string) void
  }

  class Counters {
  }

  class MetricRegistry {
    + registerPrestoMetrics() void
    + DEFINE_METRIC(counterName, statType) void
    + RECORD_METRIC_VALUE(counterName, value) void
  }

  class WorkerRuntimeUptimeMetric {
    + kCounterWorkerRuntimeUptimeSecs folly::StringPiece
  }

  PrestoServer --> PeriodicTaskManager : uses
  PrestoServer --> WorkerRuntimeUptimeMetric : records
  WorkerRuntimeUptimeMetric --> MetricRegistry : defined_in

  MetricRegistry <|-- Counters

  %% Constants represented as static members
  class PartitionedOutputBufferMetrics {
    + kCounterPartitionedOutputBufferGetDataLatencyMs std::string_view
  }

  PartitionedOutputBufferMetrics --> MetricRegistry : defined_in
  WorkerRuntimeUptimeMetric --> PartitionedOutputBufferMetrics : similar_metric_group
Loading

File-Level Changes

Change Details Files
Introduce worker runtime uptime metric and periodic reporting task in the C++ worker.
  • Add a new periodic task in PrestoServer::addServerPeriodicTasks to compute uptime since process start using std::chrono::steady_clock and record it every 2 seconds
  • Use RECORD_METRIC_VALUE to publish the uptime value under a dedicated metric name
presto-native-execution/presto_cpp/main/PrestoServer.cpp
Define and register the new worker uptime metric in the C++ metrics system.
  • Add a new metric identifier constant for worker runtime uptime seconds
  • Register the new metric in registerPrestoMetrics with StatType::AVG so it is exposed through the existing metrics infrastructure
presto-native-execution/presto_cpp/main/common/Counters.h
presto-native-execution/presto_cpp/main/common/Counters.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@jja725 jja725 force-pushed the add-worker-uptime-metric branch from 997e20d to 4a52a1a Compare January 17, 2026 01:00
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The new uptime metric is registered as StatType::AVG, but since this is a monotonically increasing uptime value it would be more accurate to use a gauge-like type (e.g., LAST/MAX) rather than an average.
  • In Counters.h the new kCounterWorkerRuntimeUptimeSecs constant uses folly::StringPiece while surrounding counters use std::string_view; consider using the same type for consistency unless there is a specific reason to differ.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new uptime metric is registered as StatType::AVG, but since this is a monotonically increasing uptime value it would be more accurate to use a gauge-like type (e.g., LAST/MAX) rather than an average.
- In Counters.h the new kCounterWorkerRuntimeUptimeSecs constant uses folly::StringPiece while surrounding counters use std::string_view; consider using the same type for consistency unless there is a specific reason to differ.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@jja725 jja725 changed the title Create a runtime metric for worker uptime to be used for restart alerts feature: Create a runtime metric for worker uptime to be used for restart alerts Jan 17, 2026
@jja725 jja725 force-pushed the add-worker-uptime-metric branch from 4a52a1a to 0945696 Compare January 17, 2026 01:34
@jja725 jja725 changed the title feature: Create a runtime metric for worker uptime to be used for restart alerts feat(metrics): Create a runtime metric for worker uptime to be used for restart alerts Jan 17, 2026
@jja725 jja725 changed the title feat(metrics): Create a runtime metric for worker uptime to be used for restart alerts feat(native): Create a runtime metric for worker uptime to be used for restart alerts Jan 17, 2026
@jja725
Copy link
Copy Markdown
Contributor Author

jja725 commented Jan 17, 2026

@beinan @jaystarshot

Summary:
Metrics change only.
Added a runtime metric for worker uptime called: presto_cpp_worker_runtime_uptime_secs
The metrics is present in java worker but is missing in c++ worker, causing the missing alert for c++ workers.

Test Plan:
Deployed to staging cluster and checked the metrics status.{F1247554645}
Impact -> no impact since this runs in a background thread

Reviewers: #ldap_velox-core, jay.narale

Reviewed By: #ldap_velox-core, jay.narale

JIRA Issues: PRESTO-9381

Differential Revision: https://code.uberinternal.com/D20790631
@jja725 jja725 force-pushed the add-worker-uptime-metric branch from 0945696 to 72dcc86 Compare January 17, 2026 01:43
@jaystarshot
Copy link
Copy Markdown
Member

LGTM, cc: @aditi-pandit can you please help review

@jja725 jja725 force-pushed the add-worker-uptime-metric branch from 8bd1b03 to f542457 Compare January 17, 2026 05:37
@aditi-pandit
Copy link
Copy Markdown
Contributor

Hi @jja725 : I'm fine with your code change. Though curious as you mention this broke existing Java code. Can you point that code to me ? Or was it Uber internal ?

@jja725
Copy link
Copy Markdown
Contributor Author

jja725 commented Jan 18, 2026

Hi @jja725 : I'm fine with your code change. Though curious as you mention this broke existing Java code. Can you point that code to me ? Or was it Uber internal ?

It's just uber internal

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jja725

@steveburnett
Copy link
Copy Markdown
Contributor

Are these runtime metrics documented anywhere? I didn't notice when I looked.

If they are not, should they be?

@jja725
Copy link
Copy Markdown
Contributor Author

jja725 commented Jan 21, 2026

Are these runtime metrics documented anywhere? I didn't notice when I looked.

If they are not, should they be?

I don't think there's doc regarding these metrics. Probably we need another PR for the document

@steveburnett
Copy link
Copy Markdown
Contributor

steveburnett commented Jan 21, 2026

Are these runtime metrics documented anywhere? I didn't notice when I looked.
If they are not, should they be?

I don't think there's doc regarding these metrics. Probably we need another PR for the document

Yes, I agree, given this PR is approved and ready to merge there's no need to hold it up now. I will open an issue to track the doc improvement needed.

@jaystarshot jaystarshot merged commit e3db5a2 into prestodb:master Jan 21, 2026
116 of 120 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants